-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays when nulls exist #46372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan Could you take a look at this followup to improve the primitive array with nulls issue. Thanks! |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @gene-db .
- Could you provide a test case to validate your contribution and to prevent any future regression at this area?
- I'd like to recommend to use
[SQL]in SQL module PR title next time.
Yes, the previous PR added many tests in this area, to make sure it is correct. The reason I am updating in this PR is to go back to using primitive arrays (and still produce correct results with the tests), which are faster than the object ones.
Ok, I will add the appropriate module name next time. Thanks! |
|
To @gene-db , for the following purpose, you should not use
|
|
Please revise the PR title and description. |
@dongjoon-hyun Updated the title and description. |
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to change it back to use primitive arrays!
|
thanks, merging to master/3.5! |
…when nulls exist ### What changes were proposed in this pull request? This is a followup to #46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy. Primitive arrays are faster than object arrays and won't create unnecessary objects. ### Why are the changes needed? This will improve performance and memory usage, when nulls are present in the `ColumnarArray`. ### Does this PR introduce _any_ user-facing change? This is expected to be faster when copying `ColumnarArray`. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46372 from gene-db/primitive-nulls. Authored-by: Gene Pang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit bf2e254) Signed-off-by: Wenchen Fan <[email protected]>
…when nulls exist ### What changes were proposed in this pull request? This is a followup to #46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy. Primitive arrays are faster than object arrays and won't create unnecessary objects. ### Why are the changes needed? This will improve performance and memory usage, when nulls are present in the `ColumnarArray`. ### Does this PR introduce _any_ user-facing change? This is expected to be faster when copying `ColumnarArray`. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46372 from gene-db/primitive-nulls. Authored-by: Gene Pang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit bf2e254) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This is a followup to #46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy.
Primitive arrays are faster than object arrays and won't create unnecessary objects.
Why are the changes needed?
This will improve performance and memory usage, when nulls are present in the
ColumnarArray.Does this PR introduce any user-facing change?
This is expected to be faster when copying
ColumnarArray.How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.